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Chapter  1 


Introduction 

1.1  Background 

In  many  application,  there  is  more  than  one  signal  source  in  a  system  and  the 
system  introduces  noise  and  different  time  delays  from  the  signal  sources  to  the 
observer  (or  sensor).  It  is  desirable  to  separate  mixtures  and  recover  the  original 
data  as  closely  as  possible.  In  most  cases,  the  original  source  and  the  system 
transfer  function  are  unknown.  Such  a  separation  problem  is  called  Blind  Source 
Separation/Deconvolution  (BSS/BSD),  depending  on  whether  the  delay  effect  is 
considered  in  the  model  or  not. 

The  BSS(BSD)  problem  has  been  receiving  increasing  attention  in  the  last 
decade  because  of  applications  in  speech  enhancement  and  recognition,  biomedical 
signal  analysis,  image  processing  and  telecommunication. 

In  the  work  I  will  discuss  here  the  major  concern  is  with  the  case  of  natu¬ 
ral  sound  sources  mixed  in  the  process  of  air  transmission  in  a  closed  space  and 
received  by  a  group  of  microphones  in  the  same  space.  Different  models  and  separa¬ 
tion/deconvolution  algorithms  are  analyzed  and  compared  in  such  an  environment 
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and  a  new  algorithm  that  makes  use  of  the  statistical  distribution  characteristics  of 
natural  sound  combined  with  a  frequency  domain  analysis  is  used  to  achieve  better 
results  in  recovering  original  data  in  real  time.  Many  open  questions,  including 
extension  to  time  varying  systems,  changing  number  of  sources,  noise  reduction, 
signal  extraction  and  varying  receiver  location,  are  addressed  to  achieve  a  practical 
and  robust  solution. 


1.2  Overview 

In  an  environment  with  n  sound  sources,  m  sensors  are  placed.  The  goal  is  to  sepa¬ 
rate  sound  from  different  sources  and  recover  the  original  signals.  We  represent  the 
sources  as  si(f),  ^(t), ...,  sn(t).  Since  the  sound  sources  are  physically  independent 
of  each  other,  we  assume  the  signals  are  mutually  statistically  independent.  The 
original  signals  Si(t),  S2(t), ..,  sn(t)  are  unobservable.  The  m  outputs  from  sensors 
are  denoted  by  x\(t) ,  X2(t) ,  ...xm(t).  These  also  intensity  functions  of  time  and  they 
are  linear  combination  of  sources. 

The  model  can  be  written  as  Xi(t )  =  Yj]=i  Sj{t)a.ij  +  rq(f),  i  =  1,  2, ...,  m,  where 
rii(t)  is  observation  noise.  The  matrix  {aij}  represents  the  mixing  in  the  envi¬ 
ronment.  Since  the  environment  may  be  changing,  {a^}  in  general  is  a  func¬ 
tion  of  time.  Let  x  =  [aq(f),  •••,  xm(t)]T  be  the  observation  data  vector, 

s  =  [si(t),  S2(t), ...,  sn(t)]T  the  source  vector  and  A  =  {a^}  be  a  time-varying 
matrix.  We  have  the  vector  form 


x  =  As  (1.1) 

A  related  task  is  blind  deconvolution  where  we  suppose  the  mixing  channel  has 
delay.  In  this  model  the  observation  x(k)  is  assumed  to  be  produced  from  s(k )  in 
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the  following  manner 


x(k)  =  Hps(k-p)  (1.2) 

p=— OO 

Where  each  Hp  is  an  n  x  m  matrix  of  unknown  mixing  coefficients  at  lag  p. 
The  goal  is  to  calculate  y(k)  =  [y\{k)  ■  ■  ■ ym(k)]T  of  possibly  scaled  and  delayed 
estimates  of  the  source  signals  in  s(k)  from  x{k )  using  a  causal  FIR  filter  given  by 

y{k)  =  Y,WP{k)x{k-p )  (1.3) 

p=0 

where  Wp(k),  0  <  p  <  L  is  a  (m  x  n)  matrix  satisfy 

W(z)H(z)  =  PA{z) 

where  P  is  a  permutation  matrix  and  A(z)  is  a  diagonal  matrix  with  A iZ~Ti  as  the 
diagonal  entries. 

There  are  several  important  assumptions  about  this  BSS/BSD  model: 

1.  The  source  signals  are  mutually  statistically  independent  for  each  sample. 

2.  At  most  one  of  the  source  signals  has  a  Gaussian  distribution. 

3.  Each  source  signal  is  a  stationary  stochastic  process. 

4.  Besides  being  stationary,  the  signals  are  all  ergodic  so  that  time  averages  can 
be  used  to  estimate  statistical  distributions. 

In  Chapter  2  different  ways  to  solve  the  BSS/BSD  problem  will  be  analyzed. 
In  Chapter  3  an  improved  subband-based  Independent  Component  Analysis  (ICA) 
algorithm  will  be  discussed.  In  Chapter  4  we  describe  the  real  world  environment 
and  provide  details  on  implementation  of  a  real-time  ICA  algorithm  for  real-world 
signals.  In  Chapter  5  the  test  results  will  be  discussed  and  some  conclusions  are 
drawn. 
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Chapter  2 


Analysis  and  Comparsion  of 
BSS/BSD  Algorithms 


The  essential  goal  of  BSS/BSD  problem  is  to  recover  unknown  source  signals  from 
mixed  observations. 

In  this  chapter  the  instantaneous  mixing  model  described  below  will  be  used 
unless  stated  otherwise: 


x  —  As  +  n  (2.1) 

where  s  is  the  vector  of  original  signals,  x  is  the  observed  signals,  A  is  the  unknown 
mixing  matrix,  a  simplified  representation  of  environment,  and  n  is  the  observation 
noise. 

The  corresponding  separated  system  should  be: 

y  =  Wx  (2.2) 

where  W  is  the  separation  matrix  and  y  is  the  recovered  signal. 
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2.1  Preprocessing 


Before  applying  the  ICA  algorithm  on  the  data  some  preprocessing  of  the  data  can 
help  to  reduce  the  noise,  accelerate  the  convergence  speed  and  reduce  computation. 
The  typical  ways  are: 

2.1.1  Whitening 

Before  ICA,  the  standard  Principle  Component  Analysis  (PCA)  algorithm  can  be 
applied  on  the  data.  PCA  produces  uncorrelated  signals.  This  is  closer  to  the  final 
goal  of  ICA,  independent  signals,  as  compared  to  the  raw  source  data. 

Denote  the  covariance  matrix  of  sources  as 

Rss  =  Es(t)sT(t )  (2.3) 

where  Rss  is  a  positive  definite  diagonal  matrix.  The  covariance  matrix  of  the 
observed  signal  is 

Rxx  =  Ex(t)xT(t)  =  HRssHt  (2.4) 

PCA  searches  for  an  orthogonal  matrix  Q  such  that  the  components  of 

y{t)  =  Qx{t) 

are  uncorrelated.  In  other  words  the  covariance  matrix  of  the  output  y  is  diagonal. 

Ryy  =  E[y(t)yT(t)\  =  QRXXQ 1  =  V  (2.5) 

In  the  case  where  the  original  signals  are  Gaussian,  PCA  results  in  independent 
output.  In  general  further  computation  is  required  on  y  to  obtain  independent 
output  components. 

Let 

x(t)  =  V~l^2QTx{t)  (2.6) 
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such  that 


Ex{t)xr(t )  =  I 


(2.7) 


Thus 

y  =  Wx(t)  (2.8) 

extracts  the  independent  signals,  where  W  is  the  separating  matrix. 

In  the  work  of  Koehler,  Lee  and  Orglmcister  [15],  the  mixed  signal  is  further 
decorrelated  by  fourth  order  statistics  before  the  ICA  algorithm  is  applied.  Let 

W4  =  (E[\\x\\2xxt})-1/2  (2.9) 

and 

x  =  WAx  (2.10) 

is  the  input  for  core  ICA  algorithm. 

2.1.2  Filtering  and  de-noising 

If  the  frequency  range  of  signals  is  known  a  band  pass  filter  can  be  used  to  remove 
all  the  noise  with  spectrum  outside  of  that  frequency  range.  While  if  noise  is 
narrow-band  a  band  stop  filter  can  be  used  to  remove  it. 

For  white  noise,  thresholding  combined  with  a  wavelet  method  is  effective.  The 
method  proposed  by  David  Donoho  [10],  removes  Gaussian  noise  and  produces  an 
estimation  that  is  balanced  between  the  two  goals  of  smoothness  and  small  bias. 

This  method  produces  very  good  results  in  the  presence  of  noise  and  will  be 
further  discussed  in  Chapter  3. 


6 


2.2  Error  functions 


Since  the  signals  from  different  sources  are  assumed  to  be  independent,  the  goal  of 
the  BSS  problem  is  equivalent  to  achieving  independent  output  by  multiplying  the 
observed  signals  with  an  invertible  matrix  or  applying  a  filter  on  the  observation. 
However  the  definition  of  independence 

P{x i,x2,  ...,xn)  =  P(x1).P(x2)...P(xn)  (2.11) 

is  not  directly  usable  in  an  error  function  because  the  statistical  distribution  of 
the  original  signal  is  unknown  and  cannot  be  estimated  from  the  mixed  signal. 
Different  approaches  have  been  thus  suggested  to  achieve  an  implcmentablc  error 
function  as  below. 

2.2.1  Cumulant 

Jutten  and  Herault,  inspired  by  neurobiological  research,  first  proposed  this  heuris¬ 
tic  learning  algorithm  to  solve  the  BSS  problem  using  the  cumulant  as  the  loss 
function  [14], 

In  contrast  with  PCA,  which  decorrelates  signals  based  on  the  covariance  ma¬ 
trix,  in  ICA  higher  order  statistics  are  used  to  find  the  independent  components. 
The  error  function  for  PCA  is: 


L(W)  =  yl(k)  (2.12) 

The  convergence  of  E(^)  =  2qmiE(y,(t,  f)yk(t,  /))  =  0  means  </,((,/)  and 
Hk(t,  f)  are  uncorrelated,  as  we  can  see  from  section  2.1.1  about  prewhitening. 
To  achieve  the  independence  between  yi(t,  f)  and  yk(t,f),  we  use  two  different 
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nonlinear  and  odd  functions 


=  af(yi(t,  f))g(yk(t ,  /)) 

tanh(y)  and  y3  are  the  commonly  used  non-linear  functions. 

2.2.2  Kurtosis 

Kurtosis  is  a  measure  of  how  far  a  statistical  distribution  is  away  from  a  Gaussian 
distribution.  According  to  the  Central  Limit  Theorem,  the  sum  of  n  independent 
identically  distributed  random  variables  has  a  Gaussian  distribution  as  n  goes  to 
infinity.  For  n  independent  but  not  identically  distributed  random  variables,  if 
their  PMF’s  satisfy  certain  conditions,  then  the  central  limit  theorem  still  holds. 

A  Gaussian  random  variable  has  kurtosis  equal  to  0.  The  closer  kurtosis  is  to 
0,  the  closer  the  random  variable  is  to  a  Gaussian  random  variable.  Kurtosis  can 
then  be  used  as  a  cost  function  for  separation  and  deconvolution.  It  is  used  more 
effectively  for  deconvolution  and  will  be  discussed  later  in  Section  2.5. 

2.2.3  Kullback-Leibler  divergence 

The  Kullback-Leibler  divergence  (  also  called  relative  entropy)  between  two  statis¬ 
tical  distributions  is  defined  as 

D(p  ||  q)  —  J  p(x)  log (p(x) / q(x))dx 

Although  it  is  not  symmetric  and  does  not  satisfy  the  triangle  property,  Kullback- 
Leibler  divergence  is  still  considered  a  measure  of  “distance”  between  two  statistical 
distributions. 

To  achieve  independent  output  means  to  minimize  the  K-L  divergence  between 
the  joint  distributions  of  the  estimated  sources  y  and  the  product  of  the  marginal 
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distributions  of  the  yt 


D 


fvWfy 


fy(y )  log(- 


fv(y) 


-)  =  -H(y)  +  Y,h(Vi))  (2-13) 


THU  fyXVi)  i=  1 

where  if  (.)  is  entropy  of  the  joint  probability  distribution. 

ft  can  be  shown  that  the  above  error  function  is  consistent  with  the  following 
[9]: 

I{y,x)  =  H{y)-H{y/x)  (2.14) 


where  I(y,x)  is  the  mutual  information  in  output  y  about  input  x,  H(y)  is  the 
entropy  of  y,  H(y/x )  is  the  entropy  of  output  y  that  did  not  come  from  input  x. 


We  thus  have 

JLI(v  x)  =  aJM 

aw  {v<  ’  aw 

since  H(y/x )  does  not  depend  on  W. 


(2.15) 


2.2.4  Maximum  likelihood 

To  estimate  one  set  of  data  from  another  set,  the  likelihood  function  is  highly 
useful.  It  is  defined  as  a  product  of  factors  obtained  by  marginalizing  over  the 
latent  variable.  In  this  case 


p(x\A)  =  Y[pi(xi\A)  (2.16) 

i= 1 


In  our  goal  is  to  achieve  W  =  A  1 


p(y\x,W  ^  =  Y[pi(yi\x,W  :) 

i= 1 


From  this  we  take  the  negative  log  likelihood  as  error  function 


l(y,W)  =  —  log  \det(W)\  -J2l°gPi(yi) 

i=  1 


(2.17) 


(2.18) 
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L(W)  =  E[l(y,W)\ 


(2.19) 


Here  W  e  Gl(n ),  the  matrix  Lie  group  of  n  x  n  nonsingular  matrices. 

Remark:  In  [16]  David  Mackay  proved  this  error  function  is  consistent  with 
K-L  distance. 


2.3  Learning  algorithms 

2.3.1  Stochastic  gradient  method 

For  the  cost  function  L(W),  the  algorithm  derived  from  steepest  descent  method 
is: 

c  ,m 

dt  dW  '  '  1 

The  error  function  L(W)  usually  contains  the  expectation  operation  which 
can  not  be  implemented  in  the  process  of  learning.  Since  the  original  signal  is 
considered  to  be  ergodic,  time  average  can  be  used  to  replace  the  expectation. 

2.3.2  Natural  gradient  method 

The  natural  gradient  algorithm  was  proposed  by  Shunichi  Amari  in  1995.  In  [1] 
Amari  proved  this  algorithm  possesses  the  equivariance  property  and  achieves  the 
Fisher  Efficiency. 

An  ideal  property  for  a  learning  algorithm  is  covariance  or  equicovariance,  which 
means  the  algorithm  should  give  the  same  results  independent  of  the  units  in 
which  quantities  are  measured.  The  steepest  descent  rule  does  not  give  a  covariant 
algorithm,  because  the  two  sides  of  the  equation  are  not  consistent  in  dimension. 
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Suppose 


A  Wi  =  fl¬ 
ow 


(2.21) 


Then  choose  p  of  a  particular  dimension  will  only  result  in  a  covariant  algorithm 
if  all  the  wy  have  the  same  dimension.  However  if  we  take 

dL 

A  Wi  =  1X^2  (2-22) 

3  1 

where  M  is  a  matrix  whose  (i,j)th  element  has  dimension  \wiUij\,  then  the 
algorithm  is  covariant.  There  are  two  ways  to  get  such  matrices,  namely  metric 
method  and  curvature  method. 

Newton’s  algorithm  is  an  example  of  getting  M  from  curvature.  In  this  algo¬ 
rithm  we  have 

A  =  — VVL  and  M  =  A-1 

In  the  natural  gradient  algorithm  we  get  the  matrix  M  from  metrics. 

The  steepest  descent  direction  of  a  function  in  a  Riemannian  space  is  given  by 


— VL(  w)  =  -G~\w)VL(w) 


(2.23) 


where  G  is  the  Riemannian  metric  and  VL(w)  is  the  standard  gradient.  Using 
equation  2.18  and  2.19,  by  taking  derivative  on  both  sides  we  have: 


dl 


-d(\og\det(W)\)  PM) 

i=  1 


-tr{dWW~l)  +  <j)(y)Tdy 


(2.24) 

(2.25) 


where 

=  ~-rlo&  ipM) 

dyi 

The  Riemannian  metric  of  a  statistical  model  is  defined  by  the  Fisher  information 
matrix  [9].  That  is. 


d  log p(x,  w)  d  log p(x,w) 

QijH  =  Ei - - - ) 


d  Wi 


dwi 


(2.26) 
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Then 


dX  =  dWW~l  (2.27) 

is  to  be  viewed  as  an  element  of  the  cotangent  space  to  Gl(n)  at  the  identity  /. 
Define  the  co- metric 

(dX,  dX)j  =  tr(dXdXT)  (2.28) 

then 

(dW,  dW)w  =  tr((dWW~1)(dWW~1)T)  (2.29) 

With  respect  to  this  metric  we  get  the  gradient 

A  W  =  [I  -  <t>(y(t))yT(t)]W  (2.30) 

The  natural  gradient  algorithm  is  given  by  the  following  : 

Wt+1  =  Wt-  7 ](t)F{y(t),  Wtj  (2.31) 

where 

F{y(t),  Wt}  =  [I  -  </>{y(t))yT(t)]W  (2.32) 

2.3.3  Non-holonomic  learning 

In  the  last  section  we  derived  the  natural  gradient  algorithm.  However,  this  learn¬ 
ing  algorithm  can  not  distinguish  W  with  AIT,  where  A  is  a  diagonal  matrix  with 
nonzero  diagonal  entries.  Thus  given  W,  the  equivalence  class  Sw  =  {W'\yV'  = 
Alb}  is  a  n-dimensional  subspace  of  S,  where  S  =  {W}  =  Gl(n).  Shunichi  Arnari, 
T.P.  Chen  and  A.  Chichocki  found  a  way  to  improve  learning  performance  by 
adding  non-honolomic  constraints  to  the  natural  gradient  algorithm  in  [3]. 

Three  kinds  of  possible  constraint  may  be  used  to  restrict  the  search  direction 
of  the  above  algorithm. 
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1.  hard  restriction 


fi(W)  =  0,  i  =  1, n 
with  the  typical  and  most  simple  choice 

MW)  =  wti  - 1 

2.  soft  restriction 

E(MVi))  =  0,  i  =  l,  ...,n 

The  typical  choice  is 

MVi )  =  Vi  ~  1 

which  guarantees  the  variation  of  the  recovered  signals  are  1. 

3.  non-holonomic  constraint 

Define  AX(  =  AWtWf 1 .  Then  the  constraints  are  given  by 

AXu  —  0,  i  —  1,  ...n 

This  constraint  implies  that  the  trajectories  of  the  dynamics  are  always  or¬ 
thogonal  to  Sw  while  not  being  restricted  to  a  fixed  sub-manifold. 

Any  matrix  can  be  uniquely  represented  by  the  sum  of  two  matrices,  a  diagonal 
matrix  and  a  matrix  with  all  diagonal  elements  zero. 

Thus  we  can  write 

Gl(n)  =  A  +  B  (2.33) 
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where 


A  —  {A  G  77  \A  —  dittQ^Cb i,  Ci2i  •  ••,  On}} 

B  =  {Be  nnxn\bu  =  b22  =  ...  =  bnn  =  0} 

Claim 

[B,  B\  =  sl(n)  (2.34) 

where 

[B,  B]  =  {x  e  B.y  e  B,  [x,  y\  =  xy  -  yx} 

Proof: 

Suppose  Z  G  sl(n)  i.e.  J27=i  zu  =  znn- 

Denote  Itj  as  a  n  x  n  matrix  with  the  (i,  j)th  element  as  1  and  other  elements 
as  0.  Denote  Jt  as  a  n  x  n  matrix  with  the  (i,  i)th  element  as  1  and  other  element 
as  0.  Then 


hj  G  ^4,  Ji  G  B 

(2.35) 

Z  ^  ^  %ij  I ij  +  ^  ^  %iiJi 

i=l...n,j=l...n,i^j  i=l..n 

(2.36) 

Ji  —  j  ^  l-.-Tl,  j  7^  i 

(2.37) 

z  = 

^  v  %ij  I ij  “1“  ^  ^  %ii  \Jik  •>  Jki\ 

(2.38) 

i=l..n,j=l...n,i^j  i=l...n—l,k^i,k£l...n 


So  any  matrix  belonging  to  sl(n)  can  be  generated  from  linear  combinations  of 
B  and  [B,B], 

This  shows  that  with  the  non-holonomic  constraints,  W  can  still  be  moved 
freely  in  the  n 2  dimensional  vector  space,  and  thus  we  will  not  miss  any  possible 
solution. 
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2.3.4  EM  algorithm 

The  EM(Estimate  and  Minimize)  algorithm  is  widely  used  in  image  deconvolution. 
The  algorithm  can  achieve  good  result  in  image  processing  because  the  distribution 
of  all  image  can  fit  into  one  model.  This  is  not  true  for  sound  signals  can  not.  H. 
Attias  applied  the  EM  idea  to  develop  the  so  called  independent  factor  method  [6]. 

The  EM  method  is  based  on  maximizing  the  log-likelihood  with  respect  to  the 
parameters  of  the  generative  model  describing  those  data.  In  the  separation  model, 
instead  of  the  likelihood  E\\.ogp(y\W)\,  we  consider  the  likelihood  of  complete  data 
E[\ogp(y,  x,  g|W)],  where  q  denotes  the  parameters  of  a  distribution  model. 

The  algorithm  consist  of  two  steps  in  one  iteration[6]: 

(E)  Given  the  observed  data  and  the  current  model,  calculate  the  expected 
value  of  the  complete-data  likelihood. 

(M) Minimize  the  error  function,  i.e.  maximize  the  corresponding  averaged 
likelihood  with  respect  to  W. 

A  more  generalized  EM  algorithm  making  use  of  IGA  is  the  Seesaw  algorithm: 

1.  Fix  the  parameter  of  the  generative  model,  use  one  or  several  iterations  of 
ICA  rules,  then  update  the  posterior  estimation  using  the  new  separating 
matrix  W. 

2.  Fix  the  separation  matrix,  use  single  step  of  EM  followed  by  updating  the 
posterior  estimation  using  the  new  value  of  parameters  in  generative  model. 

With  the  separation  matrix  fixed,  the  source  signal  can  be  reconstructed  from 
the  sensor  signal.  In  the  existence  of  noise,  a  Least  Mean  Square  or  Maximum 
aposterior  probability  estimator  can  be  used  to  recover  the  source  signal. 
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2.3.5  Stability  analysis 


A  stationary  point  of  the  algorithm  in  the  above  algorithms  is  characterized  by 
the  fact  that  the  update  of  W  has  zero-mean.  Therefore  any  invertible  stationary 
point  should  be  a  solution  of 

E[G(Wx)}  =  E[G(y )]  =  0  (2.39) 

The  separating  stationary  points  are  characterized  as  follows.  Starting  with  the 
regular  case,  let 

A,r  =  diag(Xi, ...,  An)  (2.40) 

with  each  scalar  A*  being  a  solution  of 

Elcp^XiS^XiSi]  =  1,  i  —  (2.41) 

Because  of  the  independence  assumption  and  the  zero-mean  assumption,  it  is  then 
easily  checked, that  E[Gr(Ars)]  =  0.  Therefore  the  matrix  W  =  ArA_1  is  such  that 
y  =  Wx  =  Xrs  is  a  separating  matrix  and  is  a  stationary  point  of  equation  2.31 
with  W{t  +  1)  =  W{t)  =  W0. 

With  the  stationary  points  characterized  we  now  study  their  stability.  If  these 
stationary  points  are  not  stable  they  can  not  be  attractors  in  terms  of  specific 
moments  of  the  source  distributions.  The  definition  of  these  moments  depends  on 
the  non-linear  functions  used  in  the  function  G. 

Asymptotic  analysis  yields  two  types  of  stability  conditions.  The  first  type  is  a 
source-wise  condition  expressing  that  the  estimation  of  the  scale  of  each  source 
should  be  stable.  The  second  type  of  conditions  is  pair-wise.  The  scale  stability 
condition  for  the  ith  source  is  found  to  be: 


1  +  E[v'i(yi)yi}  >o,  i  <  i  >  n 


(2.42) 
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One  often  uses  non-linear  functions  (fi  which  are  non- decreasing  and  thus  have  a 
positive  derivative.  In  this  case,  the  scale  stability  conditions  are  readily  met.  The 
pair-wise  stability  conditions  turn  out  to  depend  on  the  non-linear  functions  pds 
and  on  the  distributions  of  the  sources  via  the  moments.  Define: 


ki  =  EWi{yi)\E[y2i ]  -  E[fi{yi)yi\  (2.43) 

According  to  Jean-Francois  Cardoso  [7],  The  pair-wise  stability  conditions  are: 

(1  +  ki)(l  +  kj)  >  1,  1  <  i  <  j  >  n  (2.44) 

1  +  ki  >  0,  1  <  i  >  n  (2-45) 

For  the  natural  gradient  learning  algorithm,  we  consider  the  learning  equation 
in  its  continuous  time  version  as 


W(t)  =  !i(t)[I  -  v(y(t))yT(t)\W(t)  (2.46) 

where  W  denotes  time  derivative  of  the  matrix  W{t).  We  consider  the  expected 
version  of  the  learning  equation 

W(t)  =  -  y>(y(t))yT(t)]W(t)  (2.47) 


By  linearizing  it  at  the  equilibrium  point,  we  have  the  variational  equation 


SW(t)  =  (i(t) 


dE[I  -  p{y(t))yT{t)}W 
dW 


5W 


(2.48) 


Only  when  all  the  eigenvalues  of  the  operator  (dE[I  —  tp(y(t))yT(t)]W)/(dW)  have 
negative  real  parts  is  the  equilibrium  asymptotically  stable.  Therefore,  we  need  to 
evaluate  all  the  eigenvalues  of  the  operator.  Since  I  —  ip(y{t))yT (t)  is  derived  from 
the  gradient  dl  as  in  2.31,  we  need  to  calculate  its  Hessian  d2l 


d2i  =  T ; 


dL(y,W) 

dwijdwkl 


dwijdwki 


(2.49) 
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in  terms  of  dX.  The  equilibrium  is  stable  if  and  only  if  the  expectation  of  the 
above  quadratic  form  is  positive  definite.  Define 


ffi2  =  E[y}\ 

(2.50) 

hi  =  E[(pi(yi)\ 

(2.51) 

rrii  =  E[y?(p(yi )]  where  Lp  =  dip/dy 

(2.52) 

The  separating  solution  is  a  stable  equilibrium  of  the  learning  equation  if  and  only 

if  [2]: 

m,  +  1  >  0 

(2.53) 

o 

A 

-if 

(2.54) 

afa2jkikj  >  1 

(2.55) 

for  all  i,  j,  where  %  ^  j. 

It  is  not  difficult  to  show  that  the  stationary  point  does  exist.  However,  global 
convergence  results  are  not  available  for  n  >—  2  case. 

2.4  Variation  in  modeling 

All  the  above  methods  assume  an  instantaneous  linear  mixing  model.  However, 
this  model  does  not  truly  reflect  how  sound  transfers  and  mixes  when  it  travels 
through  air.  In  this  section,  we  will  explore  more  complex  models  that  simulate 
this  process  more  truthfully. 
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2.4.1  How  sound  spread  in  closed  space 

At  normal  pressure  and  standard  conditions  of  humidity,  the  speed  of  sound  is  a 
function  of  temperature[13] : 

s  =  (331.4  +  0.6 1)  m/s  (2.56) 

In  a  typical  sized  room,  successive  reflections  are  too  close  together  to  be  audible 
as  separate  events.  For  a  mid-frequency  sound  wave  with  given  source  and  receiver 
position  the  room  acoustic  effect  can  be  seen  as  a  linear  time  invariant  system 
producing  a  sum  of  attenuated,  filtered,  and  delayed  versions  of  the  original  signal. 

Now  we  consider  the  intensity  of  the  reflected  signal.  Intensity  is  defined  as  the 
amount  of  sound  energy  flowing  across  a  unit  area  surface  in  a  second  and  follows 
an  inverse  square  law  with  distance.  Thus  reflected  sound  is  weaker  than  sound 
coming  from  the  source  directly  because  it  travels  longer  distance. 

Consider  how  sound  spreads  in  a  room.  The  audio  character  of  the  room  is 
decided  by  the  shape  and  layout  of  the  room,  the  position  and  direction  of  both  the 
source  and  receivers,  and  the  absorption  characteristics  of  the  boundaries.  Most 
materials  absorb  more  high  frequency  energy  than  low  frequency  energy  and  thus 
the  reflected  sound  is  a  low-pass  filtered  version  of  the  original  signal. 

The  instantaneous  mixing  model  does  not  show  the  room  acoustics  exactly, 
that’s  why  we  need  to  look  at  convolution  model. 

2.4.2  Filter  convolution  model 

The  most  common  model  [5]  is  one  in  which  the  observation  x(k)  is  assumed  to 
be  produced  from  s(k)  as 

OO 

x(k)  =  £  Up*  (. k-p )  (2.57) 

P=  —  OO 
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where  Hp  is  an  n  x  m  dimensional  matrix  of  unknown  mixing  coefficients  at  lag 
p.  The  goal  is  to  calculate  y(k)  =  [yi(k)  ■  ■  ■ ym(k)]T  of  possibly  scaled  and  delayed 
estimates  of  the  source  signals  in  s(k)  from  x(k )  using  a  causal  FIR  filter  given  by 

L 

y(k)  =  Wp(k)x(k  -  p)  (2.58) 

p= o 

where  Wp(k),  0  <  p  <  L  is  a  (m  x  n)  dimensional  matrix.  We  have  W(z)H(z)  = 
PA(z)  where  P  is  a  permutation  matrix  and  A(z)  is  a  diagonal  matrix  with  A iZ~Ti 
on  the  diagonal  entry. 

2.4.3  State  space  model 

A  state  space  model  is  intended  to  express  convolution  in  real  world.  Let 


X(k+  1)  =  AX(k)  +  BS(k)  +  Pn[k)  (2.59) 

u{k)  =  CX(k)  +  DS(k)  (2.60) 

The  transfer  function  is 

H(z)  =  C(zl-  A)~lB  +  D  (2.61) 

and  the  demixing  model  is 

x(k  +  1)  =  Ax(k)  +  Bu(k)  +  Ln(k)  (2.62) 

Y(k)  =  Cx(k)  +  Du(k )  (2.63) 

The  transfer  function  of  the  demixing  system  is 

W(z)  =  C(zl  -  A)~1B  +  D  (2.64) 


As  we  can  see,  it’s  more  general  than  the  convolution  model  mentioned  before 
in  section  2.4.2 [20] . 


20 


2.5  Blind  deconvolution 


It  has  been  shown  that  BSS/BSD  problems  are  closely  related  in  structure [11], 
The  similarity  between  BSS/BSD  makes  it  possible  to  solve  BSD  problem  using 
ideas  and  methods  for  BSS. 

Consider  the  n-dimensional  source  separation  task  with  H  as  a  circulant  matrix 
with  the  first  row 

Hi  =  [ho  •  •  •  h-M  0  ■  ■  •  0  Iim  ■  ■  ■  hi]  (2.65) 


Then 

M 

Xi{k)  =  J2  hPsli-P\n(k )  (2.66) 

p=—M 

where  [.]„  denotes  the  rnod-n  operation.  Thus  x{k)  is  obtained  from  the  circular 
convolution  of  the  channel  impulse  response  hj,—M  <  j  <  M,  and  the  source 
sequence.  To  extract  the  source  sequence,  define  a  demxing  matrix  W (k)  as 


Wi-j(k)  if  [\i-j\]n<L 

0  otherwise 

The  goal  is  to  adjust  W (k)  such  that 

lim  W(k)H  =  PA  (2.68) 

k — »oo 

where  P  is  a  permutation  matrix  with  a  single  entry  in  any  row  or  column  and  A 
is  a  diagonal  nonsingular  matrix. 

Since  the  product  of  two  circulant  matrices  is  also  a  circulant  matrix,  W (k)  as 
defined  above  is  adequate  to  separate  the  source  sequence. 

As  the  dimension  of  H(z)  goes  to  infinity,  the  central  part  of  circulant  matrix 
becomes  a  Toepliz  matrix  so  circulant  convolution  becomes  convolution  as  long  as 
the  sequence  defining  the  matrices  are  absolutely  summable. 
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Hence 


OO 


Xi(k) 

=  hAk  +  i~j) 

j=- oo 

(2.69) 

Vi(k) 

oo 

=  5]  Wj{k)xi-j{k) 

(2.70) 

j=~  oo 


This  is  a  model  for  BSS  problem.  Thus  with  proper  changes  to  the  blind 
separation  algorithm,  we  can  get  an  algorithm  to  solve  the  single  channel  blind 
channel  deconvolution  problem. 

If  the  problem  is  considered  in  the  frequency  domain,  convolution  becomes 
multiplication  and  the  deconvolution  problem  in  time  domain  becomes  an  instan¬ 
taneous  demixing  problem  in  the  frequency  domain.  That  is,  for 

X(f)  =  AfS(f)  (2.71) 

We  are  looking  for  Wf  such  that 

Y(f)  =  Wjx(f)  (2.72) 

is  the  closest  estimation  of  S(f).  In  section  3.3.3  we  will  derive  a  deconvolution 
algorithm  from  this  idea. 

2.5.1  Learning  under  state-space  model 

The  state-space  model  can  also  be  used  to  describe  a  blind  separation  and  decon¬ 
volution  system.  Although  theoretically  transfer  function  models  are  equivalent 
to  state-space  models,  it  is  difficult  to  exploit  any  common  features  that  may  be 
present  in  real  dynamic  systems  by  using  transfer  function.  State-space  models 
also  make  it  much  easier  to  deal  with  the  stability  problem  and  the  realization 
problem  and  enable  more  general  descriptions  than  FIR  filtering. 
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Suppose  the  mixing  model  is  described  by  a  stable  linear  state  discrete  system 


X(k+1)  =  AX(k)  +  BS(k)  +  (k)  (2.73) 

u(k)  =  Cx(k)  +  Ds(k)  +  6(k)  (2.74) 

where  s(k)  is  a  m-dimensional  source  vector,  u(k)  is  the  n-dimensional  sensor 
signals,  x  is  the  state  vector  of  the  system,  £p(k)  is  the  process  noise,  and  6{k)  is 
the  sensor  noise  of  the  mixing  system.  The  transfer  function  of  the  system  without 
noise  is 

H(z)  =C(zI  -  A)~lB  +  D  (2.75) 

The  demixing  model  is  another  linear  state-space  system 

x(k  +  1)  =  Ax(k)  +  Bu(k)  +  L^R{k)  (2.76) 

y(k)  =  Cx(k)+Du(k )  (2.77) 

where  (k)  is  the  reference  model  noise.  The  goal  is  to  adjust  A,B,C,D,L  such 
that 

W(z)  =  C(zl  -  A)~1B  +  Dy(k)  =  W{z)H{z)  =  PA{z)  (2.78) 

where  P  is  a  permutation  matrix  and  A(^)  is  a  diagonal  matrix  with  A iZ~Ti  as  the 
diagonal  elements. 

If  the  matrix  D  is  full  rank,  the  inverse  system  exists. 

Parameter  C,  D  can  be  learned  with  same  scheme  as  described  in  the  FIR 
filter  model.  We  estimate  the  state  vector  based  on  Kalman  filter  method  instead 
of  directly  adjusting  the  A,  B  matrices. 

x(k  +  1)  =  Ax(k)  +  Bu(k)  +  Kr(k)  +  ^R(k)  (2.79) 

where  K  is  the  Kalman  filter  gain  matrix,  r(k)  is  the  innovation  vector.  In  this  case 
no  explicit  residual  r(k)  is  available  because  the  expected  output  y(k)  should  be 
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the  original  signal  which  is  unknown.  Here  we  use  an  estimation  technique  known 
as  hidden  innovation  defined  by 

r(k)  =  A  y(k)  =  A  Cx(k)  +  ADu(k)  (2.80) 

The  hidden  innovation  indicates  the  direction  to  adjust  the  output  of  the  demixing 
system  and  is  used  to  generate  an  aposterior  state  estimate.  Now  the  common 
Kalman  filter  can  be  used  as  follows  to  estimate  the  state  vector  x{k). 

1.  Compute  the  Kalman  gain  matrix 

K(k)  =  P{k)C(k)T[C(k)P(k)CT(k)  +R{k)}-1 

2.  Update  the  state  vector  using  the  hidden  innovation 

x(k)  =  x(k)  +  K{k)r{k ) 

3.  Update  the  error  covariance  matrix 

P(k)  =  [I  -  K{k)C{k)\P(k)  (2.83) 

4.  Evaluate  the  error  covariance  matrix  ahead 

P(k)  =  A(k)P(k)A(k)T  +  Q(k)  (2.84) 

with  initial  condition  P(0)=I,  where  /  is  the  identity  matrix.  Here  Q(k),  R(k) 
are  the  covariance  matrices  of  the  noise  vector  £r>  and  output  measurement 
noise  rq.  respectively. 

The  state-space  idea  is  appealing  to  people  in  the  held  of  controls  because  it 
makes  clever  use  of  the  Kalman  filter.  However,  the  idea  is  expensive  in  computa¬ 
tion  and  thus  not  practical  in  real  situations. 


(2.81) 


(2.82) 
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2.5.2  Learning  under  the  filter  deconvolution  model 

Generally  speaking  the  same  cost  function  can  be  used  for  both  the  separation 
problem  and  the  deconvolution  problem  as  long  as  the  filter  coefficient  sequence 
has  a  bounded  L2  norm. 

The  following  are  different  algorithms  derived  from  different  error  functions: 


1.  Minimize  mutual  information  between  outputs 


I(Y,  X)  =  H(Y)  -  H(Y/X)  (2.85) 


where  I(Y,  X)  is  the  mutual  information  contained  in  the  output  Y  about 
the  input  X,  H(y )  is  the  entropy  of  Y. 


H(Y/X )  is  the  entropy  of  the  output  Y  that  did  not  come  from  the  input 
X.  Then 


^HY,x)  =  YHm 


since  H(Y/X )  does  not  depend  on  u>. 


(2.86) 


2.  Minimize  the  Kullback  divergence  between  the  output  and  the  product  of 
marginal  output 

D,  ,7  =  r  lr(Y)  logf  MY  )  =  —H(Y)  +  y  h(y A)  (2.87) 

frtfr  >  8'n  UfnM  h 

where  H(-)  is  the  entropy  of  a  probability  function. 

3.  Maximum  likelihood  [6] 

6  =  [d,n],  6~l  =  [kffi1,  —  A~lrn]  (2.88) 


where  n  is  observation  noise. 


px(x,6 )  =|  detA  |  1  g[A  x(x  —  n)]  (2.89) 


25 


According  to  S.  Amari,  S.  C.  Douglas,  A.  Cichocki  and  H.  H.  Yang  [5],  the 


most  effective  form  is 


J(w(z,  k)) 


m 

X!log  Pi(Vi(k)) 

i—  1 


l 

27 Tj 


log \detW(z,  k)\z  1dz 


Define 


<Pi(yi)  =  -~r^Pi{yi) 

dyi 


Then 


m  m 

d(-  l°SPi(yi(k )))  =  PiiViiWdyiik)  =  <PT(y(k))dx(z ,  k)y(k) 

i=  1  i=  1 


Similarly 

j>  tr(dW(z,  k)W~x(z,  k))z~ldz  =  tr(dX0(k )) 
Thus  the  natrual  gradient  deconvolution  algorithm  is 


d(^— :  j)  log \detW(z,  k)\z  ldz)  = 


27 rj 


Wp(k  +  1)  =  Wp(k)  +  n[Wp(k)  -  <p(y(k))uT(k)}  (2.90) 


with 

OO 

Up(k)=  Y:  Wj (k)y(k  —  p  +  q)  (2.91) 

q=— oo 

In  a  practical  implementation,  one  can  approximate  the  doubly  infinite  filter 
with  a  FIR  causal  filter  given  by 

y{k)  =  YWp{k)x{k-P)  (2.92) 

p= o 

We  now  have 


Wp(k  +  1)  =  Wp[k)  +  y[Wp(k)  -  <p( y(k  -  L))u*T {k  -  p)\  (2.93) 


with 

u(k)  =  YW?_q(k)y(k  +  q)  (2.94) 

q= 0 
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We  now  show  the  same  equation  2.93  can  be  deducted  from  the  natural  gradient 
ICA  in  the  frequency  domain.  First  we  have 

X(f)  =  AfS(f)  (2.95) 

Where  X(f)  and  S(f)  are  the  Fourier  transform  of  the  mixtures  and  the  original 
sources  at  frequency  /.  We  are  looking  for  Wf  such  that 

Y(f)  =  WfX(f)  (2.96) 

is  the  closest  estimation  of  S(f). 

The  learning  rule  from  the  natural  gradient  algorithm  [4]  is 

A W  =  n[I  -  ip{y)y*T]W  (2.97) 

Rewriting  the  same  equation  in  the  frequency  domain  we  have 

A  Wf  =  n[I  -  fft{v{y))YJT]Wf  (2.98) 

Since  the  assumption  of  independence  is  also  valid  in  the  frequency  domain,  all 
the  deductions  we  have  done  before  are  still  valid  and  the  frequency  domain  ICA 
algorithm  has  the  same  form  as  the  above  equation  2.93. 

Now  apply  the  Fourier  transform  to  both  sides  of  2.98  to  get 

A W{z)  =  n\W{z)  -  ip{y)  *  y*T  *  W(z )]  (2.99) 

With 

up  =  [y(t-p)*T*W(z)]*T  (2.100) 

=  J2wqy(k~p  +  (i)  (2.101) 

We  have 

A W{z)  =  n\W (z)  -'Zviy-L)*  u;T]  (2.102) 

This  is  exactly  the  same  format  in  [5]  as  mentioned  above  in  2.93. 


27 


2.5.3  Kuicnet  approach 


According  to  the  Central  Limit  Theorem,  the  sum  of  n  independent  identically 
distributed  random  variables  has  a  Gaussian  distribution  as  n  goes  to  infinity. 
For  n  independent  but  not  identically  distributed  random  variables,  if  their  PMF’s 
satisfy  certain  conditions,  then  the  central  limit  theorem  still  holds.  Define  kurtosis 
of  a  random  variable  as 

E(y(k)4)  -  3(E(y(k)2)2)  (2.103) 

Obviously  a  Gaussian  random  variable  has  kurtosis  equal  to  0.  One  can  verify 
that  the  closer  the  kurtosis  is  to  0,  the  closer  the  random  variable  is  to  a  Gaussian 
random  variable.  Kurtosis  can  then  be  used  as  a  cost  function  for  separation  and 
deconvolution. 

For  the  single  channel  problem  [12]  we  have 


w{k  +  1)  =  w{k)  +  /i[\ y(k  —  L)\2y{k  —  L)u*{k)  —  \y(k  —  L)\Aw{k))  (2.104) 


where 


Alternatively 


where 


L 

u{k)  =  Ytwl-j{k)y{k-j)  (2.105) 

l=o 

w{k  +  1)  =  w(k)  +  y[y*(k)fT(y(k))  -  F(y(k))]w{k)  (2.106) 

y(k)  =  [y(k)---y(k-L)}T  (2.107) 

F(y(k))  =  {diag\y{k)\A,- ■  ■  ,\y{k- L)\a}  (2.108) 

f{y{k))  =  [\y{k)\2y{k)  ■  ■  ■  \y(k  —  L)\2y{k  —  L)]T  (2.109) 


Chapter  3 


Sub-band  Based  ICA  Algorithm 


In  last  chapter  we  introduced  several  methods  of  blind  separation  and  blind  de¬ 
convolution.  By  theoretical  deduction  and  experimental  result  we  showed  that 
the  most  effective  method  is  the  non-holonomic  algorithm  proposed  by  Shunichi 
Amari,  T.  P.  Chen  and  A.  Chichocki. 

In  this  chapter  the  algorithm  is  further  developed  by  making  use  of  wavelets.  A 
more  robust  and  faster  algorithm,  namely  the  sub-band  ICA  algorithm  is  pre¬ 
sented.  In  section  1,  a  brief  introduction  to  the  human  hearing  system  is  given. 
This  system  is  the  inspiration  for  the  sub-band  ICA  algorithm.  In  s  section  2,  the 
process  of  sub-band  ICA  is  described  in  detail.  In  section  3,  several  problems  of 
the  sub-band  ICA  implementation  are  discussed.  The  sub-band  ICA  algorithm  is 
extended  to  solve  the  deconvolution  problem. 

The  idea  of  subband-based  ICA  idea  comes  from  the  fact  that  humans  process 
acoustic  signals  on  different  frequency  bands  independently.  This  method  devel¬ 
oped  by  Yuan  Qi,  P.S.  Krishnaprasad  and  S.  Shamma  [18]  provides  robust  perfor¬ 
mance  in  the  presence  of  noise  and  reduces  the  computational  complexity.  This 
idea  enables  a  real-time  separation  method. 
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3.1  Filter-bank  structure  in  the  human  ear 

It  is  well  known  that  sound  waveform  spreading  in  the  air  is  transformed  in  vi¬ 
brational  mechanical  energy  in  the  middle  ear,  then  further  in  to  the  vibration  of 
the  basilar  membrane  located  inside  the  cochlea.  Since  the  basilar  membrane  is 
narrow  and  stiff  near  the  base,  and  wider  and  softer  near  the  end.  High  frequencies 
excite  the  base  portion  more  strongly  than  the  end  portion;  on  the  contrary  low 
frequencies  excite  the  base  portion  more  weakly  than  the  end  portion.  High  and 
low  frequency  disturbances  arrive  at  their  respective  peak  basilar  membrane  points 
nearly  simultaneously.  This  leads  to  the  assumption  that  the  basilar  membrane 
acts  like  a  filter  bank.  The  vibration  of  the  basilar  membrane  produces  motion 
of  the  stereocilia  which  then  cause  the  response  of  the  auditory  nerves.  Thus  the 
entire  process  of  human  hearing  starts  with  a  filtering  action.  The  engineering 
model  of  ear  is  showed  in  Figure  3.1 

Much  work  has  been  done  to  develop  the  bank  model.  For  example,  in  [13], 
Dudley  used  a  bank  of  ten  bandpass  filters,  each  with  a  frequency  width  s  of  300Hz, 
to  process  a  human  voice  ranging  from  300-3000Hz. 

3.2  Sub-band  ICA 

The  outline  of  the  algorithm  is  described  as  the  following: 

1.  Each  component,  Xj(n ),  of  the  observation  x(n)  is  hltered  through  a  filter 
bank,  resulting  in  a  subband  signal.  Two  possible  choices  of  the  filter  bank 
are  a  cochlea  filter  or  an  orthogonal  Daubechies  wavelet  packet.  Since  wavelet 
packet  is  easier  to  implement  and  provide  linearity,  the  Daubechies  wavelet 
packet  is  used  in  the  final  implementation. 
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Figure  3.1:  The  model  of  human  ear 


HIGHER  AUDITORY  CENTERS 


2.  The  power  of  the  decomposed  signal  in  each  band  is  computed  and  ordered. 

3.  The  ICA  learning  algorithm  is  applied  on  the  bands  with  the  strongest  power 

4.  The  soft-thresholding  algorithm  is  applied  to  the  subband  decomposed  sig¬ 
nals. 

5.  The  overall  demixing  matrix  W  is  received  from  the  demixing  matrix  of  each 
of  the  subbands,  by  using  the  competitive  learning  rule  to  cluster  the  rows 
of  the  demixing  matrices  on  different  sub-bands. 

Figure  A. 3  shows  the  structure  of  the  sub-band  ICA  method. 

The  sub-band  ICA  algorithm  improves  the  performance  of  ICA  in  the  presence 
of  noise.  If  the  noise  is  narrow-band,  then  good  separation  can  be  performed  on 
the  noise  free  sub-bands.  If  the  noise  is  broad  band,  ICA  is  performed  on  those 
sub-bands  with  strongest  power,  i.e.  largest  signal  to  noise  ratio(SNR). 

Since  the  wavelet  coefficients  are  typically  Laplace  distributed,  the  sub-band  signal 
has  a  more  peaky  and  heaviky  tailed  distribution  than  the  original  signal.  And 
thus  the  sub-band  based  ICA  converges  to  the  demixing  matrix  with  faster  speed. 
The  sub-band  based  separation  idea  has  also  been  proposed  by  Hyung-Min  Park 
[17].  In  Park’s  work  the  learning  is  performed  in  the  frequency  domain  and  actu¬ 
ally  incooparates  the  convolution  model  and  the  idea  of  speech  recognition.  The 
method  proposed  by  Yuan  Qi,  P.S.  Krishnaprasad  and  S.  Shamma  is  more  general, 
and  less  computationally  demanding. 
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3.3  Implementation  of  the  algorithm 

3.3.1  Use  adaptive  basis  selection  in  the  wavelet  packet 

Sub-band  ICA  enhances  the  separation  capability  of  standard  ICA  by  decomposit¬ 
ing  the  signal  into  different  frequency  bands.  The  design  of  the  filter  bank  can 
greatly  affect  the  performance.  For  example,  we  should  not  divide  the  signal  into 
two  bands  where  it  should  be  continuous  in  the  time-frequency  plane.  The  filter 
bank  design  should  vary  according  to  different  signal  properties.  The  solution  to 
this  problem  is  to  apply  the  adaptive  basis  selection  algorithm  [8]  on  the  summa¬ 
tion  of  all  the  mixed  signals  to  get  the  best  bases.  This  algorithm  ensures  a  filter 
bank  that  does  not  not  split  any  of  the  signals  into  improper  frequency  bands. 

3.3.2  Selecting  the  bands  to  perform  ICA 

The  ICA  algorithm  does  not  need  to  be  applied  on  every  band  coming  out  of  the 
filter  bank.  In  reality,  the  mixing  matrix  for  different  frequency  bands  varies  from 
one  to  the  other.  In  BSS,  however,  only  one  mixing  matrix  is  used  in  the  model. 
Estimating  the  mixing  matrix  of  the  bands  with  strongest  power  and  then  taking 
the  average  will  produce  the  best  separation  result.  From  the  experiment  on  human 
voice,  we  find  the  strongest  one-quarter  of  the  total  bands  always  contain  around 
sixty  percent  of  total  energy.  For  music  signals  the  energy  spread  more  evenly, 
but  one-quarter  to  one-half  of  total  bands  is  still  able  to  produce  good  separation 
result.  The  amount  of  computation  is  greatly  reduced. 
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3.3.3  Introducing  the  convolution  model  into  the  sub-band 
algorithm 

The  convolution  model  provides  a  much  better  simulation  of  how  sounds  are  mixed 
in  the  process  of  traveling.  By  introducing  this  model  into  the  sub-band  algorithm 
we  can  expect  it  to  produce  better  separation  performance,  although  with  a  heavier 
computation  load. 

THus  after  sub-band  filtering,  instead  of  applying  the  non-holonomic  1CA  al¬ 
gorithm,  we  apply  the  natural  gradient  algorithm  describe  in  Chapter  2, 

wp(k  +  1)  =  Wp(k)  +  p{k)  [Wp(k)  -  f(y(k  -  L)u*T {k  -  p)}  (3.1) 

where 

u(k)  =  ■£  WlT_Q(k)y(k  -  q)  (3.2) 

9—0 

The  mixing  filter  W  for  signals  in  different  frequency  bands  are  not  exactly  the 
same.  According  to  Ben  Gold[13],  the  surface  of  any  object  has  different  reflection 
absorption  characteristics  to  sound  waves  of  different  frequencies.  In  addition, 
waves  of  different  frequencies  show  different  diffusion  when  there  is  an  obstacle  in 
the  path  of  transmission.  So  instead  of  taking  a  weighted  average  on  the  demixing 
matrices  to  get  an  overall  demxing  matrix  as  we  did  in  sub-band  ICA,  we  should 
separate  the  mixture  in  each  sub-band  by  it’s  own  demixing  matrix  and  then 
recover  the  original  signal  by  going  through  an  inverse  filter  bank.  Unfortunately, 
this  method  is  too  complex  in  computation.  The  experiment  shows  that  applying 
the  deconvolution  filter  for  the  sub-band  with  highest  energy  will  work  reasonabley 
well  to  recover  the  original  signals. 
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Chapter  4 


Working  with  Real  World  Signal 


In  Chapter  2  and  Chapter  3,  the  theoretical  aspects  of  BSS/BSD  algorithms  were 
discussed.  Here  we  explore  the  implementation  side.  An  integrated  system,  in¬ 
cluding  a  mobile  sensor  platform  and  computation  unit  is  built,  and  a  real  time 
ICA  algorithm  is  implemented  on  the  system  to  process  signals  recorded  by  the 
sensors.  In  Section  1  the  hardware  setting  and  environment  are  described;  in  Sec¬ 
tion  2  specific  problems  related  to  real-time  processing  are  discussed;  in  Section 
3  a  new  criterion  to  evaluate  the  performance  of  separation  without  knowing  the 
mixing  matrix  is  introduced  and  verified. 

4.1  Hardware  and  environment 

To  give  an  overview  of  the  hardware  environment  in  this  project,  we  first  mention 
the  hardware  used  in  the  sequence  of  data  flow: 

1.  Styrofoam  head  equipped  with  two  microphones  and  amplifiers.  The  micro¬ 
phones  are  mounted  approximately  at  the  ears  and  are  used  to  collect  sound 
data. 
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2.  Nomadic  Technology  Super  Scout  II  mobile  robot.  This  is  the  mobile  plat¬ 
form  that  carries  the  sensors  and  transfers  data  to  computing  unit. 


The  capability  and  functions  of  the  hardware  components  are  described  respec¬ 
tively  below. 
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4.1.1  Styrofoam  head  and  robot 

The  task  of  collecting  data  is  performed  by  Nomadic  Super  Scout  11  mobile  robot. 
A  Styrofoam  head  is  mounted  on  the  robot.  In  the  position  of  its  ears,  two  mi¬ 
crophones  are  installed  to  simulate  the  human  auditory  perception.  A  specially 
designed  amplifier  is  used  to  transfer  the  signal  from  the  microphones  to  a  proper 
level  for  the  sound  card  on  the  robot.  The  amplifier  needs  a  DC  power  supply  that 
can  either  be  provided  by  batteries  or  the  power  supply  of  robot. 

Data  is  sampled  at  a  rate  of  8KHz,  and  is  transfered  as  blocks  containing  512 
data  elements.  Each  data  element  is  represented  in  PCM  format  as  a  16  bit  signed 
integer. 

The  robot  is  equipped  with  a  set  of  touch  sensors  and  sonar  sensors,  and  a 
dedicated  board  to  control  the  motors  and  sensors.  At  the  top  level  is  a  PC/104 
embedded  PC.  RedHat  Linux  runs  as  the  host  operating  system.  The  robot  is 
connected  to  a  wireless  network  via  an  IEEE  802.11  network  card. 

4.1.2  Windows  NT  workstation 

Windows  NT  provide  a  versatile  platform  to  cooperate  among  the  console  appli¬ 
cation,  TI  code  composer,  the  robot  and  MATLAB.  In  this  project,  Windows  NT 
acts  as  the  bridge  between  data  collector  and  the  computing  unit.  The  workstation 
here  is  Gateway  E-5200  machine  with  the  Windows  NT  operating  system  and  will 
be  called  “the  host”  in  the  following.  A  console  program  runs  on  the  Windows  NT 
machine,  which  communicate  between  the  robot  and  DSP  processor.  It  is  referred 
to  as  the  “host  program” . 
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4.1.3  DSP  processor 

The  DSP  we  use  in  this  project  is  the  TMX320C6701  float  point  digital  signal 
processor,  which  has  a  167  Mhz  clock  rate  and  allows  four  float  point  arithmetics, 
two  fixed  point  arithmetics  and  two  multiplications  to  happen  at  the  same  time. 
The  processing  power  helps  the  complicated  ICA  algorithm  to  be  implemented  in 
real  time. 

In  the  design  of  this  project,  the  DSP  processors  are  responsible  for  executing 
the  ICA  or  deconvolution  algorithm,  denoising  the  input  data,  and  calculating  the 
separated  output.  The  computation  task  is  divided  between  two  DSP  processors. 

The  master  DSP  perform  the  following  tasks:  communicate  with  the  host  ap¬ 
plication;  reads  the  input  data  stream;  passes  the  data  through  a  wavelet  filter 
bank;  denoises  it;  puts  the  intermediate  data  into  shared  RAM  for  the  slave  DSP 
to  read,  and  reads  the  output  of  the  slave  DSP  when  it  is  ready,  ;  implements  the 
wavelet  reconstruction,  multiplies  by  the  separation  matrix  to  produce  the  output. 
The  slave  DSP  calculate  the  separation  matrix  from  the  data  it  finds  in  the  shared 
RAM  and  puts  the  the  new  separation  matrix  into  the  shared  RAM  for  the  master 
DSP  to  read. 

The  task  of  the  master  DSP  is  performance  critical  to  ensure  a  continuous  sound 
output.  All  the  computation  has  to  be  performed  at  a  speed  faster  than  the  the 
speed  at  which  sound  data  is  collected.  If  the  data  processing  is  not  fast  enough, 
some  data  will  be  lost.  The  task  of  the  salve  DSP,  however,  is  not  that  urgent. 
In  a  realistic  environment,  the  mixing  matrix  is  not  changing  very  fast.  A  separa¬ 
tion  matrix  calculated  from  data  collected  1/10  second  ago  will  still  work  well  to 
separate  the  current  data. 
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4.1.4  Python  board 


The  Python/C6  multi-DSP  board  installed  on  the  NT  machine  can  support  up  to 
four  6701  DSP  chips.  A  maximum  of  4MB  of  shared-RAM  between  the  C60  can 
serve  for  inter-DSP  communication. 

A  set  of  APIs  are  provided  to  communicate  between  the  host  machine  and  the 
DSP  chips.  The  C60  Native  API  provides  DSP  programmers  direct  control  over 
the  resources.  The  C60  Host  API  allows  an  application  on  the  host  to  handle  basic 
I/O  operations  with  the  Python/C6,  and  must  be  used  in  conjunction  with  a  C60 
Native  API  application  running  on  the  Python/C6’s  DSPs. 

4.1.5  Room  environment 

In  this  experiment,  the  reflection  from  the  wall  and  objects  in  the  room  cannot  be 
neglected.  The  robot  will  be  put  in  the  center  area  of  a  10  x  12  empty  space  and 
slowly  moved  in  circle  of  radius  two  to  three  feet.  Around  the  empty  space  are 
walls  and  tables  that  will  scatter  and  reflect  sound.  The  separated  data  is  played 
out  through  a  speaker  in  another  area  of  the  room  and  thus  will  not  produce  any 
interference  with  the  room  acoustics. 


4.2  Considerations  for  a  real  time  implementa¬ 
tion 

All  the  algorithms  we  discussed  in  previous  section  are  designed  for  off-line  ex¬ 
periment.  But  here  we  will  conduct  the  experiment  in  real-time.  A  continuous 
output  of  the  separated  sound  signal  is  desired,  with  as  little  delay  as  possible. 
Thus  several  adjustments  to  the  algorithm  need  to  be  made. 
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4.2.1  Data  communication 


A  program  on  the  robot  builds  the  socket  connection  with  the  host  program,  reads 
the  data  from  a  buffer,  and  periodically  sends  the  data  to  the  host  through  the 
socket. 

The  host  program  is  responsible  for  the  following  tasks:  1.  building  socket  connec¬ 
tions  with  the  data  collecting  process  on  the  robot  and  the  data  play-out  process; 
2.  loading  the  sub-band  ICA  program  on  to  the  DSP  chips,  and  communicating 
control  message  and  display  information  to  the  DSP  chips.  The  main  structure  of 
the  host  program  is  a  loop  waiting  for  command  input  from  console.  The  process¬ 
ing  functions  are  used  as  call-back  functions  from  a  dynamic  link  library.  When 
message  arrives,  such  structure  produce  the  fastest  response  to  a  message  coming 
from  DSP. 

4.2.2  Optimizing  the  code  for  speed  and  memory  size 

Many  tools  were  provided  with  the  DSP  to  speed  up  the  program.  Here  we  chose 
to  use  the  following: 

1.  Write  the  most  computationally  demanding  part  in  assembly  language.  As 
the  wavelet  filtering  and  convolution  part  is  executed  on  every  incoming 
data  block,  the  task  is  heavy  and  time  critical  and  should  thus  be  written  in 
assembly  language.  Other  parts  of  program  are  still  in  C  to  make  the  whole 
project  easy  to  read  and  manage. 

2.  Select  optimizing  parameter.  In  the  compiler  provided  by  TI,  there  are  sev¬ 
eral  choices  about  how  to  optimize  the  code.  The  program  is  both  time- 
critical  and  memory  critical  and  the  compiling  parameter  should  be  set  ac- 
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cordingly. 


3.  Make  use  of  the  parallel  processor.  On  the  the  Python  there  are  4  DSP 
chips,  and  each  chip  has  the  ability  to  execute  several  manipulation  con¬ 
currently.  Evenly  dividing  the  tasks  between  several  DSPs  and  performing 
matrix  computations  in  parallel  can  speed  up  the  process  a  lot. 

The  high  speed  memory  for  each  DSP  is  small  compared  to  the  large  amount  of 
data.  The  strategy  is  to  dynamically  allocate  and  release  fast  memory  inside  every 
cycle.  The  shared  memory  among  the  DSPs  is  slow  so  the  data  exchange  between 
DSPs  needs  to  be  carefully  designed. 

4.2.3  Tuning  parameters  for  the  best  performance 

Much  research  has  been  done  and  many  different  approaches  have  been  proposed. 
We  have  described  the  reason  to  choose  sub-band  ICA  with  non-holonomic  natural 
gradient  algorithm  as  the  core  algorithm.  Still  there  are  several  variables  unde¬ 
cided.  The  choice  is  made  by  balancing  computational  load  and  performance,  and 
considering  the  characteristic  of  the  original  signals. 

1.  Choice  of  non-linear  function  tp(y).  The  optimal  choice  is 

<Pi(Vi)  =  ~dlog(pi(yi)))/dyi 

which  yields  the  fastest  convergence  behavior.  Convergence  still  happens 
using  other  functions.  For  the  case  of  voice  a  sub-Gaussian  signal, 

Mvi)  =  \yi\2Vi 

yields  adequate  separation.  For  super- Gaussian  sources, 

fi(yi)  =  tanh^yi)  with  7  >  2 
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produce  good  result. 


2.  Choice  of  fi.  The  learning  process  should  be  fast  enough  to  follow  the  changes 
in  mixing  matrix  resulted  by  the  movement  of  robot  and  thus  this  parameter 
cannot  be  too  small.  If  /i  is  too  large  an  overflow  can  result  and  thus  a  monitor 
should  be  set  in  program  to  reset  the  learning  process  without  causing  error. 

3.  Choice  of  block  size  and  filter  channels.  This  needs  to  be  considered  with  the 
constraint  of  memory  size.  In  our  implementation  we  chose  data  block  with 
512  data  elements  from  each  sensor,  filtered  through  16  channels.  The  ICA 
algorithm  is  applied  to  four  or  eight  of  them  with  the  biggest  power  intensity. 

4.2.4  Smoothing  between  blocks 

The  calculation  is  based  on  512  data  elements  blocks,  which  corresponds  to  a 
sample  of  0.064  second.  To  produce  a  consistent  output  sound  with  good  quality, 
the  separation  matrix  used  for  each  block  must  not  change  too  much  in  one  step, 
and  must  be  consistent  in  scale  factor.  Two  measure  were  used  here  to  prevent 
inconsistency  between  the  learning  result  of  two  consecutive  blocks.  First,  the 
separation  matrix  of  the  last  block  was  used  as  the  start  value  of  the  learning 
process  for  the  new  data  block.  Second,  clustering  was  used  to  match  the  output 
channels. 


4.3  Performance  evaluation  criterion 

In  the  case  where  the  mixing  matrix  is  known,  the  performance  can  be  measured 
by  the  product  of  mixing  matrix  and  demixing  matrix.  Defining  P  =  W A,  the 
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most  commonly  used  performance  index  is: 

n  n  I  I  n  n  |  | 

E  =  V( V  lPv  ,  -  1)  +  V(V  lPlJ  ,  -  1)  (4. 

maxk\pik\  maxk\pkj\ 

It  is  easily  seen  that  perfectly  separated  signals  will  have  E  equal  to  zero.  The 
smaller  the  value  of  E,  the  better  the  separation  is. 

When  no  information  about  mixing  matrix  is  available  the  only  clue  we  have  is  the 
output  signal.  The  mixed  signals  are  similar  to  each  other  while  well  separated 
signals  should  be  different  in  shape.  However,  the  effect  of  scaling  and  delaying 
must  not  be  considered,  because  the  separation  algorithm  is  not  controlled  in  these 
two  directions.  Thus  the  signals  cannot  be  compared  directly  in  the  time  domain. 
One  idea  is  to  calculate  the  statistical  distribution  of  the  output  signals  by  count¬ 
ing  histogram  and  comparing  the  result.  This  method  completely  removes  scaling 
and  delay  factors.  However,  although  different  sound  sources  produce  signals  with 
different  probability  distributions,  the  distribution  functions  can  be  very  similar 
to  each  other.  For  example,  two  speech  sentences  from  two  different  male  human 
speakers  have  almost  the  same  distribution  curve. 

Another  idea  is  to  consider  the  frequency  domain.  Transforming  the  signal  to 
the  spectral  domain  can  remove  most  of  the  delay  effect.  To  remove  the  scaling 
effect  and  reduce  computation,  a  natural  idea  is  that  of  Linear  Prediction  Coeffi- 
cients(LPC).  LPC  are  chosen  to  minimize  the  squared  error  between  the  observed 
and  predicted  signals.  The  predicted  signals  tend  to  be  consistent  in  spectrum 
with  the  original  signal  at  the  peak  but  not  at  the  valleys,  giving  an  envelop  of 
the  spectrum  of  the  original  signal.  Using  higher  order  coefficient  gives  more  accu¬ 
rate  estimation  but  is  computationally  more  expensive.  For  the  purpose  of  speech 
recognition,  a  tenth  order  predictor  has  the  lowest  Akaike  Information  Index(AII). 
In  this  project,  we  need  to  judge  the  separation  result  of  signals  with  a  wider  spec- 
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Judged  by  listening 

very  successful 

good 

poor 

index  E 

0.2247 

1.3773 

3.5344 

LPC  index 

68.6908 

0.8948 

0.2402 

Table  4.1:  Comparison  of  two  kind  of  performance  index 

tram  than  speech  signal  (for  example,  music),  we  choose  the  LPC  order  to  be  20. 
The  normalized  difference  between  LPC  of  the  original  signal  and  the  separated 
signal  is  a  good  criterion  for  the  two  signal  case.  Here  is  a  comparison  between  a 
commonly  used  index  and  a  LPC  index  calculated  on  separated  signals  with  known 
mixing  matrix  in  Table  4.1  .  We  can  see  the  LPC  index  well  represents  the  quality 
of  separation.  The  bad  thing  about  it  is  the  index  may  be  very  large  when  we  have 
a  very  good  separation  (without  noise).  That  is,  it  is  not  normalized,  not  like  the 
index  E  as  defined  in  equation  (4.1). 
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Chapter  5 


Experimental  Results 


In  this  chapter  different  experiments  are  described  in  detail  and  the  results  are 
displayed  and  compared  with  separation  result  that  appear  in  earlier  literature. 
From  the  analysis  of  these  experimental  results,  some  important  conclusion  about 
the  ICA  model  are  derived. 


5.1  The  effect  of  source  type  on  performance 

In  the  case  of  two  sound  sources,  we  compare  the  separation  results  of  two  male 
voices,  two  female  voices,  one  voice  and  one  instrument,  and  two  instruments.  The 
performance  of  the  separation  algorithm  is  affected  by  how  similiar  the  two  sound 
sources  are.  The  similarities  are  considered  in  time  domain,  frequency  domain, 
intensity  level  and  time  delay.  Intensity  and  time  delay  are  determined  by  the 
source  and  microphone  position  and  will  be  discussed  in  a  later  section. 

Since  the  separation  algorithm  is  built  from  the  difference  based  on  the  statistical 
distribution,  if  the  original  signals  are  similar  in  distribution,  the  ICA  algorithm 
does  not  produce  good  separation  results.  Off-line  MATLAB  experiments  already 
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source  1 

man’s  voice 

man’s  voice 

man’s  voice 

source  2 

man’s  voice 

women’s  voice 

music 

index  E 

0.7133 

0.3364 

0.2247 

Table  5.1:  Simulation  result  of  sub-band  ICA  algorithm  on  different  type  of  sources 

show  that  separation  between  music  and  voices  is  better  than  two  different  voice 
of  the  same  sex  (see  Table  5.1). 

Our  work  with  real-world  recordings  has  produced  results  consistent  with  the 


Table  5.2:  Simulation  result  of  real-time  separation  of  real-world  recording  on 
different  type  of  sources 


48 


5.2  The  effect  of  source  distance  and  angle 


In  this  experiment  we  want  to  show  that  the  delay  in  transmission  and  reflection 
can  not  be  omitted  for  real  world  signal.  That  means  the  instantaneous  mixing 
model  can  only  produce  good  result  under  certain  condition.  The  best  results 
can  be  expected  when  the  position  of  source  and  alignment  of  microphone  work 
together  so  that  both  source  arrive  the  two  microphone  at  the  same  time.  This  area 
is  shown  in  grey  in  Figure  5.3.  Unfortunately  since  the  signal  from  this  area  arrive 


Areas  that 
instantaneous 
mixing  model 
works 


Layout  of  Experiment 


Figure  5.3:  The  layout  of  Experiment 

at  both  microphones  via  similar  paths  and  distances,  the  intensities  do  not  have 
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much  difference  either.  Thus  the  signals  are  mixed  by  a  ill-conditioned  matrix.  In 
such  situations,  separation  can  be  performed  but  the  performance  and  convergence 
speed  are  poor.  By  comparing  the  separation  output  of  an  instantaneously  mixed 
signal  and  the  real-world  mixed  signal  in  position  B  (same  signal  source  but  in 
different  time  zone)  on  Figure  5.4,  the  disadvantage  of  instantaneously  mixing 
model  can  be  seen. 


Figure  5.4:  Comparing  the  instantaneous  mixing  model  and  the  real-world  mixing 
model 

When  the  robot  is  moving  around  and  sources  stay  in  a  fixed  position,  the  quality 
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of  the  separated  signal  changes  dramatically  depending  to  the  robot  position.  The 
placement  of  source  and  robot  are  shown  in  Figure  5.3,  and  the  performance  index 
at  four  positions  are  shown  in  Table  5.3.  The  output  wave  form  in  position  A,  C 
are  showed  in  Figure  5.5  and  5.6  respectively. 


Figure  5.5:  Separation  of  real-world  mixture  in  in  position  A 

In  all  positions,  the  separation  algorithm  is  still  working,  that  is,  the  separated 
waveforms  show  the  characteristic  of  a  voice  signal  and  a  music  signal.  However, 
when  played  out,  the  acoustic  effect  is  not  ideal.  The  convolution  effect  is  obvious, 
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Figure  5.6:  Separation  of  real-world  mixture  in  position  B 


position 

A 

B 

C 

D 

LPC  index 

0.88948 

0.0407 

0.1953 

1.0311 

Table  5.3:  Performance  index  of  sub-band  ICA  on  real-world  recording  on  different 
source  angle 
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and  the  music  sounds  in  a  lower  tone  than  it  should.  Besides  the  voice  signal  is 
not  clear. 

Thus  we  need  the  convolution  model  to  achieve  better  and  more  robust  perfor¬ 
mance.  With  the  deconvolution  algorithm,  separation  results  show  dramatic  im¬ 
provement.  The  following  figure  5.7  shows  the  output  from  deconvolution  algo¬ 
rithm  under  the  same  conditions  as  described  before.  The  performance  index  of 


original  sound  source  1 


xIO4 


Figure  5.7:  Result  of  deconvolution  algorithm  in  position  A  and  B 

sub-band  deconvolution  algorithm  at  the  same  placement  of  source  and  robot  as 
Table  5.3  is  shown  in  Table  5.4. 
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position 

A 

B 

C 

D 

LPC  index 

5.4293 

6.0226 

4.9812 

5.3354 

Table  5.4:  Performance  index  of  sub-band  deconvolution  on  real- world  recording 
on  different  source  angle 

5.3  The  effect  of  room  acoustics 

Even  if  the  experiment  is  restricted  to  a  closed  room  and  a  fixed  environment, 
the  mixing  hlter  still  change  dramatically  with  repect  to  the  position  of  the  micro¬ 
phones  and  sources.  Reflection  has  a  large  effect  on  the  mixing  hlter.  The  following 
plots  show  the  impulse  responses  of  the  demixing  hlter  at  position  B  in  Figure  5.3. 
The  hlter  shape  is  pretty  random.  It  is  reasonable  since  the  distance  between  the 
two  microphones  is  approximately  10  cm,  corresponding  to  140  data  samples  at 
8KHz  and  a  10  tap  hlter  is  just  too  short  to  show  the  convolution  process.  When 
both  sensor  and  source  are  placed  close  to  the  wall  the  effect  of  reflection  should  be 
very  strong.  However,  because  of  the  limited  computation  power  and  memory,  the 
hlter  tap  in  simulation  cannot  more  than  50  while  the  delay  caused  by  refection 
is  generally  much  more  than  that  and  thus  the  reflection  effect  does  not  appear  in 
the  simulations  so  far. 


5.4  Over-complete  and  under-complete  mixtures 

In  a  realistic  environment  the  number  of  sound  sources  may  change  from  time  to 
time.  When  a  voice  signal  is  studied,  for  example,  the  pause  between  words  and 
sentences  may  be  long  enough  that  the  algorithm  should  consider  the  signal  source 
as  having  been  turned  oh.  When  there  are  more  sources  than  microphones,  we  call 
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it  an  under-complete  situation;  when  there  are  more  microphones  that  sources,  we 
call  it  an  over-complete  situation. 

Perfect  separation  in  the  under-complete  case  is  impossible.  In  this  case,  the  sep¬ 
aration  algorithm  will  put  the  two  (or  more)  most  similar  sources  as  one.  How  to 
determine  the  similarity  is  algorithm  dependent.  In  the  Kuicnet  method,  it  means 
the  sources  have  the  closest  kurtosis;  in  the  informax  method,  it  means  the  sources 
have  the  closest  distribution  function. 

In  the  over-complete  case  there  is  more  information  available  and  thus  better  sepa¬ 
ration  results  can  be  expected  if  the  number  of  sources  is  known  and  the  algorithm 
is  adapted  to  this  information  [19] .  In  addition,  operations  can  be  added  to  remove 
noise,  thus  producing  a  clearer  separation  result. 

In  this  project,  the  case  of  four  sensors  and  two  signal  sources  is  studied.  The 
output  contains  an  estimation  of  the  two  original  signals  and  of  two  channels  of 
noise.  It  is  interesting  to  realize  that  the  two  channels  of  noise  have  about  the 
same  intensity  as  the  estimated  source  signals. 

5.5  Conclusion  and  future  work 

In  the  work  I  presented  here,  all  current  major  approaches  for  BSS/BSD  problem 
are  viewed  and  compared  and  a  real-time  implementation  of  sub-band  natural 
gradient  method  is  provided  which  works  well  for  real-world  recorded  voice  and 
music  signals.  By  experimenting  with  different  sensors  and  source  settings,  we 
learned  more  about  how  sound  is  transmitted  in  a  closed  room,  how  sounds  are 
mixed  and  convolved  and  how  to  measure  the  performance  of  separation. 

Several  problem  still  needs  further  work  to  achieve  better  results: 

1.  Denoising.  The  real- world  signal  contains  noise  from  many  sources,  for  ex- 
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ample  the  noise  coming  from  the  microphone,  and  the  motor  and  fan  on  the 
robot.  The  noise  is  significant  in  the  recorded  mixture  especially  when  the 
robot  is  moving.  Although  some  wavelet  denoising  methods  are  already  ap¬ 
plied  in  the  algorithm,  the  noise  is  not  removed  completely.  Other  methods 
must  be  considered  if  a  clearer  output  is  desired. 

2.  Speed  of  computation.  For  the  instantaneous  mixing  model,  the  slave  DSP 
can  give  an  improved  demixing  matrix  for  every  input  and  when  the  robot  is 
moving  or  the  mixing  matrix  has  changed  for  some  other  reason,  the  output 
can  follow  the  change.  For  the  deconvolution  model,  the  learning  process 
needs  more  time  and  cannot  follow  the  change  quickly  enough.  More  works 
need  to  be  done  to  increase  the  speed  of  the  algorithm,  by  either  distributing 
the  computational  load  on  the  DSP  chips  more  evenly,  rewriting  more  code 
in  assembly,  or  by  employing  more  advanced  algorithms  to  do  the  wavelet 
filtering  job. 

3.  Signal  extraction.  It  would  be  ideal  if  the  algorithm  could  extract  certain 
kinds  of  signals,  for  example  the  voice  of  one  person,  from  a  mixture.  This 
requires  the  use  of  the  signal  characteristics  in  a  more  thorough  way  than  is 
currently  done. 

4.  Sound  localization.  Currently,  the  sound  separation  algorithm  can  not  be 
combined  with  a  localization  algorithm  because  the  phase  information,  which 
is  essential  to  localization,  is  damaged  in  the  process  of  separation.  It  will 
be  a  interesting  topic  to  try  combine  the  two  problems  together  so  that  the 
separation  can  provide  more  information  for  localization  or  sound  tracking. 
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On  the  theoretical  side  of  the  BSS/BSD  problem,  many  problems  remain  open, 
such  as  choice  of  the  best  non-linear  function,  convergence  and  stability  analysis, 
and  adapting  the  algorithm  to  different  sources  and  sensor  settings.  These  will 
attract  more  more  interest  form  researchers. 

We  hope  to  have  provided  a  good  review  of  the  BSS/BSD  problem  and  an  im¬ 
plementation  that  provides  robust  results  so  that  future  workers  in  this  held  can 
build  on  our  work  and  achieve  even  better  results. 
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Appendix  A 


Software  Structure 

A.l  Connection  between  Coreco  board  and  NT 
Host 

The  host  program  have  two  major  part,  executable  netsrv.exe  and  dynamic  link 
library  dspsrv.dll.  The  library  dspsrv.dll  handles  all  the  requests  from  DSP  board. 
Some  important  functions  it  provides  are:  CreateCorecoServerQ.  Coreco  Server  is 
a  program  that  reside  on  NT.  Figure  A.l  describes  how  the  host  program  netsrv.exe 
interact  with  the  code  loaded  on  DSP  board  natsep.out. 

Netsrv.exe  call  dspsrv.dll  behind  the  scene  for  all  the  message  handling  during 
run  time. 
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Figure  A.l:  Host  program  and  server  program 
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A. 2  Sub-band  ICA  algorithm 


A. 2.1  Data  structure 

To  Implement  wavelet  filter  bank,  data  structure  for  interbank  and  circular  buffer 
are  defined  as  follows: 

1.  stereo  sound  data  structure 

typedef  struct  { 

WAVE_T  ch[2]; 

}  WAVEJSTEREO; 

The  one  sample  of  sound  signal  is  stored  as  two  16  bits  integer.  In 
each  transaction  a  block  of  512  sound  signal  samples  are  transferred 
from  robot  to  NT  host  as  an  array  of  WAVEJSTEREO  structure. 

Filter  structure  The  information  about  filter  bank  is  stored  in  a  structure 
called 

2.  FILTER_HEAD. 

This  structure  contains  the  information  about  filter  bank.  There 
are  four  fields  in  this  structure.  Ltype  is  the  enumeration  of  filter 
types;  depth  means  the  depth  of  wavelet  filter  bank  tree  structure; 
order  means  the  order  of  the  tree  structure,  in  this  project,  the 
value  is  always  2;  nDwords  mean  the  length  of  filter  coefficient 
vector. 

typedef  struct  { 


61 


UINT32  Ltype; 

UINT32  depth; 

UINT32  order; 

UINT32  nDwords; 

}  FILTERJdEAD; 

3.  Structure  COEF 

COEF  is  the  filter  coefficient  structure.  It’s  have  two  fields,  point¬ 
ing  to  the  high  pass  filter  coefficient  array  and  low  pass  filter  coef¬ 
ficient  array  separately, 
typedef  struct  { 

COEF_T  *H,*L; 

}  COEF; 

4.  Filter  Bank 

Data  structure  for  filter  bank  use  both  structure  FILTERJdEAD 
and  COEF. 
typedef  struct  { 

FILTERJdEAD  *hdr; 

COEF  *coef; 

}  COEF_BANK; 

5.  Circular  buffer 

Circular  buffer  is  a  very  important  idea  of  implementing  digital 
filter.  In  this  project,  to  improve  performance,  we  try  to  implement 
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filtering  and  down  sampling  in  one  step.  Thus  the  circular  buffer 
is  different  than  standard, 
typedef  struct  { 

CIRCULAFLT  *ptr,  *top,  ^bottom; 

UINT32  blockJeng; 

}  CIRCULAR; 

The  structure  CIRCULAR  allocates  a  whole  trunk  of  memory 
that  can  be  divided  into  several  blocks  each  has  equal  length  of 
blockJeng.  When  we  go  through  the  tree  structure  of  wavelet 
filter-bank,  on  each  level,  the  original  circular  buffer  is  divided 
into  two  block  contained  down  sampled  data  from  high  pass  filter 
and  low  pass  filter  separately.  Figure  A. 2  shows  the  structure  of 
the  sub-band  ICA  method. 

6.  Matrix 

To  implement  ICA  algorithm,  matrix  structure  is  essential.  The 
structure  of  matrix  is  defined  as  follows 
typedef  struct  { 

int  row_order; 
int  colunm_order; 
float**  data; 

}mymatrix; 

Matrix  multiplication,  addition,  inverse,  copy  and  matrix  norm 
functions  are  also  defined  based  on  this  matrix  structure. 
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circular  buffer 


Figure  A. 2:  Circular  buffer  and  wavelet  filter  bank 
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A. 2. 2  Algorithm 


The  structure  of  sub-band  ICA  algorithm  is  described  in  Chapter  3  of  thesis.  Here 
we  will  explain  how  to  implement  the  algorithm  on  two  DSP  board.  The  graph 
below  describes  how  two  DSP  board,  the  master  DSP  and  slave  DSP  interact  with 
each  other.  Figure  A. 3  shows  the  structure  of  the  sub-band  ICA  method. 

A. 3  File  Structure 

Main.c:  initialize  memory,  call  function  openboard()  to  set  the  connection  be¬ 
tween  Coreco  board  and  NT  machine.  Then  the  program  goes  into  an  infinite 
loop.  Inside  the  loop,  stereo  sound  data  is  read,  and  function  app()  is  called  to 
apply  wavelet  filtering  on  the  data.  For  the  result,  call  sort()  according  to  av¬ 
erage  power  and  than  call  ICA  algorithm  to  separate.  Dbbank.c:  contains  the 
wavelet  filter  bank  function  app().  Firfilter.c:  contains  all  the  functions  related 
to  filtering  calculation  which  will  be  used  by  app().  Cluster. c:  contains  the  neu¬ 
ral  network  clustering  algorithm.  Matrix. c:  contains  all  the  functions  related  to 
matrix  structure  and  matrix  operations.  Ica.c:  contains  the  ICA  learning  algo¬ 
rithm.  Code  with  detailed  commented  is  available  under  project  directory  / -./de¬ 
partment /isr /labs/ isl/ Projects /subica/C -code /comment 
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Figure  A. 3:  Master  DSP  program  and  slave  DSP  program 
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Appendix  B 


Project  Directory 


All  the  source  code,  data  file  and  pictures  mentioned  in  the  thesis  is  store  under 
directory  /:/department/isr/labs/isl/Projects/yumao. 

Below  is  a  brief  description  of  directory  structure  and  content  inside.  All  the  bold 
character  means  directory  names. 

C_code: 

Comment: 

the  commented  code  to  illustrate  the  basic  idea  of  imple¬ 
menting  sub-band  ICA  on  one  DSP  board. 

WorkingJca: 

Netsep_m:  master  DSP  code 
Netsep_s:  slave  DSP  code 

Load  the  two  programs  into  two  DSP  for  a  sub-band 
ICA  implementation  The  program  for  master  DSP 
read  data,  sub-band  filtering  and  apply  separating 
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matrix  on  data.  The  program  for  slave  DSP  applies 
ICA  algorithm  on  filtered  data  to  get  the  separating 
matrix. 

Working_dec: 

Netsep_m:  master  DSP  code 
Netsep_s:  slave  DSP  code 

Load  the  two  programs  into  two  DSP  boards  for  a 
sub-band  deconvolution  implementation.  The  pro¬ 
gram  for  master  DSP  read  data,  sub-band  filtering 
and  apply  deconvolution  filter  on  data.  The  pro¬ 
gram  for  slave  DSP  applies  deconvolution  algorithm 
on  filtered  data  to  get  the  deconvolution  filter. 

Coreco_code: 

Netsrv.exe  and  dspsrv.dll:  the  NT  host  program 

Data: 

This  sub-directory  stores  experimental  result.  All  the  sound 
data  is  stored  as  binary  file.  The  MATLAB  file  datareader.m 
provide  an  example  of  reading  the  stereo  sound  data  and 
make  it  acceptable  for  MATLAB  sound  function.  Note  that 
for  NT  and  LInix  system  the  file  format  has  a  little  difference. 
Pictures  used  in  the  thesis  are  stored  in  postscript  format. 


Playout: 


An  executable  to  play  the  output  sound  signal  real  time.  The 
program  use  socket  to  get  data  from  host  program,  then  it 
write  the  data  directly  in  to  buffer  of  sound  card  to  enable  a 
smooth  sound  playing. 
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